基于DNN的视频对象检测(VOD)为自动驾驶和视频监视行业提供了重要的重要性和有希望的机会。但是,由于其实用性,可行性和强大的攻击效果,对抗贴片攻击在现场视觉任务中产生了巨大的关注。这项工作提出了Themis,这是一种软件/硬件系统,可防止对抗贴片,以实时稳健的视频对象检测。我们观察到,对抗斑块在具有非稳定预测的小区域中表现出极为局部的表面特征,因此提出了对抗区域检测算法,以消除对抗性效应。Themis还提出了一种系统的设计,以通过消除冗余计算和记忆运输来有效地支持该算法。实验结果表明,提出的方法可以有效地从可忽略的硬件开销中从对抗性攻击中恢复系统。
translated by 谷歌翻译
社交媒体意见两极分化的大量工作集中在媒体痕迹不同社区的立场(或正交信念)的平坦分类。我们在两个重要方面扩展了这项工作。首先,我们不仅检测到社区之间的分歧点,而且还检测到一致性点。换句话说,我们在存在重叠的情况下估计社区信念。其次,代替平坦的分类,我们考虑了层次的信念估计,在该估计中,社区可能会分层。例如,两个反对党可能在核心问题上不同意,但是在一方,尽管同意基本面,但在进一步的细节上可能会出现分歧。我们称由此产生的组合问题为分层重叠的信念估计问题。为了解决它,本文开发了一类新的无监督的非负矩阵分解(NMF)算法,我们称信仰结构化矩阵分解(BSMF)。我们提出的无监督算法捕获了潜在的信仰交叉点和差异性以及等级结构。我们讨论算法的属性,并在合成数据集和现实世界数据集上进行评估。在合成数据集中,我们的模型将误差降低了40%。在实际的Twitter痕迹中,它的准确性提高了约10%。该模型还可以在理智检查中实现96.08%的自洽性。
translated by 谷歌翻译
在实践中,在实践中应用机器学习算法的瓶颈缺乏大规模标记的数据。转移学习是利用其他数据来改善下游性能的流行策略,但是找到最相关的数据可能是具有挑战性的。神经数据服务器(NDS)是一种为给定的下游任务提供相关数据的搜索引擎,以前已被提议解决此问题。 NDS使用经过数据源培训的专家组合,以估计每个源和下游任务之间的相似性。因此,每个用户的计算成本都随着来源的数量而增长。为了解决这些问题,我们提出了可扩展的神经数据服务器(SND),这是一种大规模搜索引擎,理论上可以索引数千个数据集以将相关的ML数据提供给最终用户。 SND在初始化过程中训练专家在中介数据集上的混合物,并通过与中介数据集的近距离表示数据源和下游任务。因此,随着新数据集添加到服务器中,SNDS用户产生的计算成本仍然固定。我们验证SND在许多现实世界任务上,发现SNDS推荐的数据改善了基线的下游任务性能。我们还通过显示其选择相关数据以在自然图像设置之外传输的能力来证明SND的可伸缩性。
translated by 谷歌翻译
虽然在巨大数据上培训的机器学习模型导致了几个领域的断路器,但由于限制数据的访问,他们在隐私敏感域中的部署仍然有限。在私有数据上具有隐私约束的生成模型可以避免此挑战,而是提供对私有数据的间接访问。我们提出DP-Sinkhorn,一种新的最优传输的生成方法,用于从具有差异隐私的私有数据学习数据分布。 DP-Sinkhorn以差别私人方式在模型和数据之间的模型和数据之间最小化陷阱的分歧,将计算上有效的近似值,并在模型和数据之间使用新技术来控制梯度估计的偏差差异的偏差折衷。与现有的培训方法不同,差异私人生成模型主要基于生成的对抗网络,我们不依赖于对抗性目标,这令人惊叹的难以优化,特别是在隐私约束所施加的噪声存在下。因此,DP-Sinkhorn易于训练和部署。通过实验,我们改进了多种图像建模基准的最先进,并显示了差异私有的信息RGB图像综合。项目页面:https://nv-tlabs.github.io/dp-sinkhorn。
translated by 谷歌翻译
近年来,大规模的深层模型取得了巨大的成功,但巨大的计算复杂性和大规模的存储要求使其在资源限制设备中部署它们是一个巨大的挑战。作为模型压缩和加速度方法,知识蒸馏通过从教师探测器转移黑暗知识有效提高了小型模型的性能。然而,大多数基于蒸馏的检测方法主要模仿近边界盒附近的特征,这遭受了两个限制。首先,它们忽略边界盒外面的有益特征。其次,这些方法模仿一些特征,这些特征被教师探测器被错误地被视为背景。为了解决上述问题,我们提出了一种新颖的特征性 - 丰富的评分(FRS)方法,可以选择改善蒸馏过程中的广义可检测性的重要特征。所提出的方法有效地检索边界盒外面的重要特征,并消除边界盒内的有害特征。广泛的实验表明,我们的方法在基于锚和无锚探测器上实现了出色的性能。例如,具有Reset-50的RetinAnet在Coco2017数据集上达到39.7%,甚至超过基于Reset-101的教师检测器38.9%甚至超过0.8%。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译